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Abstract —This paper proposes an efficient parameterization 
of the Room Transfer Function (RTF). Typically, the RTF 
rapidly varies with varying source and receiver positions, hence 
requires an impractical number of point to point measurements 
to characterize a given room. Therefore, we derive a novel 
RTF parameterization that is robust to both receiver and 
source variations with the following salient features: (i) The 
parameterization is given in terms of a modal expansion of 3D 
basis functions, (ii) The aforementioned modal expansion can be 
truncated at a finite number of modes given that the source and 
receiver locations are from two sizeable spatial regions, which are 
arbitrarily distributed, (iii) The parameter weights/coefficients 
are independent of the source/receiver positions. Therefore, a 
finite set of coefficients is shown to be capable of accurately 
calculating the RTF between any two arbitrary points from a pre¬ 
defined spatial region where the source(s) lie and a pre-defined 
spatial region where the recelver(s) lie. A practical method 
to measure the RTF coefficients is also provided, which only 
requires a single microphone unit and a single loudspeaker unit, 
given that the room characteristics remain stationary over time. 
The accuracy of the above parameterization is verified using 
appropriate simulation examples. 

I. Introduction 

The room transfer function (RTF), demonstrates the col¬ 
lective effect of multipath propagation of sound between a 
source and a receiver within a given room enclosure. Accurate 
modeling of the RTF is useful in soundfield simulators as 
well as many other applications such as sound reproduction, 
soundfield equalization, echo cancellation, and speech dere¬ 
verberation. These applications use appropriate RTF decon¬ 
volution methods to cancel the effects of room reflections 
(reverberation), and therefore, are highly dependent on the 
accuracy of the RTF model. 

The theoretical solution to the RTF based on the Green’s 
function HI was derived assuming a strict rectangular room 
geometry. It can only be applied to highly idealised cases with 
reasonable effort. The rooms with which we are concerned in 
our daily life however are more or less irregular in shape and 
the formulation of irregular boundary conditions will require 
extensive numerical calculations. For this reason, the immedi¬ 
ate application of the classical model to practical problems in 
room acoustics is limited. 

In practice, RTFs are usually estimated as FIR filters, or as 
parametric equations based on the geometrical properties of 
the room. In the FIR filter approach, the RTF is assumed to 
behave as a linear time-invariant system, and then modeled 
as either an all-zero, all-pole, pole-zero H or a common 
pole-zero m system. The coefficients of these models are 
estimated as variable parameters of the RTF, and since the 


RTF is extremely sensitive to source and receiver variations, 
the coefficients too experience a similar sensitivity El. This 
problem not only requires repetitive parameter calculations 
with varying source/receiver positions, but also demands for 
adaptive inverse-filters with cumbersome processing algo¬ 
rithms during equalization a, 0 . Furthermore, in practice, 
the time invariant aspect of room acoustics is far from reality 
0, which remains as a fundamental weakness of the time- 
invariant filter model. 

In contrast, the geometric room acoustics model, heavily 
relies on the room geometry and ray optic methods borrowed 
from computer graphics. The first geometric model for room 
reverberation was introduced by Allen and Berkley 0. This 
work became the basis for many subsequent geometric models 
and is based on the notion that reverberation can be represented 
as the effect of an infinite number of image sources that are 
created by reflecting the true acoustic source in room walls. A 
faster algorithm to evaluate the image source method for single 
source-multiple receiver applications was later introduced in 
0 using the multipole expansion. Other common geometric 
models include ray tracing 0, beam tracing 0, acoustic 
radiosity ifTOl . and Finite Difference Time Domain (FDTD) 
HD, na methods. Even though these techniques have certain 
similarities, their theoretical foundations are often unique for 
each method. For example, the ray tracing method assumes 
high operating frequencies while the FDTD method assumes 
a low-mid frequency bandwidth Hlfl Therefore, their appli¬ 
cability to a general room is quite limited. More generalized 
geometric models incorporating multiple specialized models 
were recently introduced in H3-H3. However, due to the 
lack of preciseness in reflection methods, and the vast variation 
of room geometries available, an exact estimation of the RTF 
based on geometrical properties remain unresolved. 

Due to the inefficiency of existing RTF models, alternative 
equalization techniques tend to measure the RTF at a finite set 
of points which are later incorporated to the sound processing 
algorithm directly Hsi-na. However, as explained earlier, 
even a small-scale variation in source/receiver positions results 
in a drastic variation in the RTF HD, and therefore, the above 
method only gives accurate results at the design points, while 
the performance degradation present elsewhere is too signif¬ 
icant. Additional limitations are caused by the inaccuracies 
involved with the point-point RTF measurements. Recent work 
on improving the RTF measurement via modified source and 

* At high frequencies, the computational cost is too high due to the increased 
number of points (small wavelengths). 
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receiver directivity patterns include Il20l - ll2^ . 

A complete equalization solution that is robust to receiver 
point variations was first proposed in ll24l for 2D applications, 
which exploits a novel RTF model based on the harmonic 
solution to the wave equation. This model parameterizes the 
RTF between a fixed source and any arbitrary point within 
a source-free receiver region in terms of a weighted sum of 
2D basis functions, while the weights need to be separately 
measured. Thus, the successful extraction of a finite set of 
parameter weights/coefficients enables RTF characterization 
between a given source location and any arbitrary point within 
a given receiver region. However, these coefficients remain 
unique to the source location of interest, and therefore, the 
slightest variation in source positioning requires a new set of 
RTF parameters to be measured. 

In this paper, we introduce an efficient RTF parameterization 
in 3D, that is robust to both receiver and source variations so 
that the extraction of a finite set of coefficients is sufficient 
to characterize an entire room enclosure of interest. In other 
words, we derive a 3D model, which characterizes the RTF 
between any two arbitrary points from a primary spatial region 
where the source(s) lie and a secondary spatial region where 
the receiver(s) lie. More importantly, we impose no restrictions 
on the geometrical configuration of the source and receiver 
regions and as a result, the proposed parameterization is valid 
for any two arbitrary points from the given room. Following 
IMI, this parameterization is based on the harmonic solution 
to the wave equation, and therefore, is derived in terms of 
a weighted sum of 3D basis functions. Furthermore, it only 
requires a minimum of {Ng + l)'^{Nr + 1)^ coefficients to 
characterize the RTF over an iV* order source region and 
an iV* order receiver regior0. We also provide a practical 
method to extract the aforementioned coefficients, which only 
requires RTF measurements over a finite set of source-receiver 
combinations and associated numerical processing. Given the 
room characteristics remain stationary over time, these mea¬ 
surements can be obtained using a single microphone unit and 
a single loudspeaker unit. 

The paper is structured as follows. In Sec. m we first 
decompose the room response into direct and reverberant com¬ 
ponents where the former is known and the latter is unknown. 
The unknown reverberant component is then parameterized in 
terms of a weighted sum of 3D basis functions. In Sec |III1 
we describe a robust method to obtain the parameter weights, 
which only requires a finite set of RTF measurements. Finally, 
in Sec. lYl we demonstrate the accuracy of the proposed 
parameterization, by comparing it with a simulated room based 
on the image source model. This section also presents an error 
analysis performed over a broadband frequency range. 

II. Parameterization oe the room transeer 

EUNCTION 

A. Problem formulation 

The main objective of this paper is to have an efficient 
parameterization for the RTF such that it is valid for variations 

^ Section Ill-B I discusses how the order of a spatial soundfield is determined 
over a known region and a given frequency. 


in the receiver position as well as in the source position. 
Therefore, we first define a continuous spatial region where 
the source(s) lie (source region) and a continuous spatial 
region where the receiver(s) lie (receiver region), and the new 
parameterization is expected to deliver the RTF between any 
two arbitrary points from these two regions. 

For computational simplicity, we assume the receiver region 
named 77 to be a sphere of radius Rr centered at the origin O 
and the source region named Q to be another sphere of radius 
Rs centered at Og (See Fig. [U. In spherical coordinates, the 
receiver point within 77 is denoted by a; = (x, Ox, fx) and the 
source location within is denoted by y = {y,dy,(l)y) where 
y = 2 /^*^ + Rsr with 9y^\ 4'^y'^) representing the 

same source location with respect to Og and Rgr representing 
the vector connecting O to Og. 

In a reverberant environment, the acoustic transfer function 
between x and y can be decomposed in to a direct path field 
and a reflected field as 

H{x,y,k) = Hi:„{x,y,k) + H„b{x,y,k) (1) 

where k = ^nf/c is the wave number, / is the frequency and c 
is the speed of sound propagation. The direct field component 
due to a unit amplitude point source at y is independent of 
the room characteristics and can be given in terms of ll25l 

^ik\\x-y\\ 

iTdir(tc, y, k) = —ii-11. ( 2 ) 

47r \\x — y\\ 

However, H„b{x,y,k), the corresponding reflected field 
incident at 77 is unknown, and completely dependent on the 
room characteristics. Our aim is to parameterize this unknown 
field so that a finite set of weights/coefficients unique to the 
room will be capable of predicting H„\,{x,y,k) between any 
two points from C, and 77. 

We base our parameterization approach on the fact that the 
unknown Hrjb{x, y, k) incident on 77 is caused by the outward 
propagating wavefield from ((. Since both these incoming and 
outgoing soundfields can be represented in terms of modal 
decomposition^ H„b{x,y,k) could also be represented in 
terms of a similar decomposition. The coefficients of such a 
decomposition will then enable the user to predict the RTF 
between two arbitrary points from C, and 77. Following the 
above concept, we first decompose the reverberant field at 77 
due to an arbitrary outgoing field from (j and then derive an 
exact decomposition for the room transfer function. 

B. Modal decomposition of an arbitrary reverberant field 

Consider an arbitrary outgoing field from C,, which can be 
represented in terms of a spherical harmonic decompositior0 
with respect to Og as 

Ng n 

n—O m——n 

(3) 

decomposition using the basis functions of the solution to the wave 
equation. 

'^Other coordinate systems could be used instead of spherical coordinates, 
resulting in a different set of basis functions. 
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where denotes the observation point 

outside of C, Pnmik) denotes the coefficients of the outgoing 
soundfield caused by the source distribution in Ynm{&,4') 
denotes the spherical harmonic of order n and degree m, 
hn{-) represents the spherical Hankel function of the first 
kind with order n and Ns = \keRs/2] denotes the exterior 
field truncation limit for a source distribution with its furthest 
source located at a distance of Rs ll26lFi 

If the resulting reflected field at rj due to each unit amplitude 
outgoing mode of Q can be extracted, the total reflected 
field caused by an arbitrary outgoing held can be successfully 
predicted. To demonstrate the above statement, let’s consider 
a unit amplitude outgoing wave of order n' and mode m' 




1, n = n' and m = m' 
0, otherwise, 


(4) 


C. Modal decomposition of the room transfer function 


Now consider a unit amplitude point source at € C> 

producing outgoing soundfield coefficients /Hum (k) of the form 


p^flik) = 4®)). (8) 

The corresponding reflected field at rj describes the unknown 
reverberant component HrMh{x,y,k) of ([TJ. This can be de¬ 
rived using O and (|8j as 


Ns n Nr V 

H„bix,y,k) = ik'^ 'll <^vj!'ik)jn{ky''‘‘'>)jv{kx) 

n—0 m— — n v—0 fj.— — v 

(9) 


producing 

Soutiz^^\k) = hnfkz^^^)Yn'm'iel\fi^'>). (5) 


Therefore, the total acoustic transfer function between any two 
arbitrary points from the source region C, and the receiver 
region rj can be given in terms of the direct held (l2]l and 
reflected field © components as 


For this particular outgoing source held, there will be a 
resulting reflected field present at the receiver region ry. Ir¬ 
respective of the geometrical configuration of C and p, the 
mirror images of the sources within C are always outside of 
T] and therefore, the aforementioned reflected field will be a 
source free incoming held. Such a soundfield can be given in 
terms of a harmonic decomposition of the form 


H{x,y,k) 


^ik\\x-y\\ 

Att \\x — y\\ 


Ns n Nr V 

n—0 m——n v—0 fi——v 


Jn {ky^^^ )jv {kx)Y:^ , 4®) (0, ,</>,). 

( 10 ) 


Comments: 


Nr V 

Rn'm'{x,k) ='12 'll {k)j^{kx)Y^f,{0s:,(j)s:) (6) 

= 0 fL — — V 

where (k) denotes the soundfield coefficients of the 

reverberant held incident at y caused by an unit amplitude n'* 
order and m'* mode outgoing soundfield at jn{') represents 
the spherical Bessel function of order n and Nr = \keRr/2] 
denotes the interior held truncation limit 1260 lial^'ik) of 
® can be recorded up to order Nr for each unit amplitude 
outgoing mode from C, the reverberant held at y due to an 
arbitrary outgoing held at C, can be derived using 0, 0 and 
© as 

Ns n Nr V 

Pr^b{x,k) = ll H HH I3l:l{k)af^{k)jr{kx) 

n=0 m——n i?=0 ii——v ^ 

4^x)’ 

^The truncation of a spherical harmonic based soundfield decomposition 
was originally derived based on the high pass behavior of Bessel functions. 
More precisely, Bessel functions of the form jn{'^) at x < fcr tend to be 
close to zero for orders above N = kerl2, and play an insignificant role 
in the infinite summation. In case the reader is confused by the absence of 
Bessel functions in m, please note that the modal coefficients (^) of any 
arbitrary outgoing soundfield can be represented in terms of Bessel functions 
( 26 ). 

^Truncation is derived following the same principle discussed earlier. 


• Based on the above result (fTOl) . the RTF can be 
parameterized in terms of a spherical harmonic 
decomposition. If aff^{k), the weights/coefflcients of 
this parameterization can be accurately captured, they 
can be used to derive the RTF between any two arbitrary 
points from a continuous spatial region where the 
source(s) lie and a continuous spatial region where the 
receiver(s) lie. 

• To generalize the RTF over an iV* order source region 
where Ng = kraa^eRs/'^ and an iV* order receiver region 
y, where Nr = k^^^eRr/2, the above parameterization 
requires a minimum of {Nr + l)^(iVs -f 1)^ unique 
coefficients of the form aff^{k). For example, when 
the maximum frequency of interest is /maxi kHz and 
the source and receiver regions of interest are both 
spheres of radius 0.2 m with Ng = Nr = 5, a fixed 
number of 1296 unique coefficients are required to 
calculate the RTF between any two arbitrary points 
X and y from y and / respectively. In broadband 
applications, the total coefficient count will increase with 
each frequency sample fo requiring an additional set of 
{koeRr/2 + l)^(fcoei?s/2 + 1)^ coefficients. 

> Due to the decomposition of direct and reverberant 
components, this parameterization supports any 
configuration of y and /. As shown in Fig. [T] they 
can be either completely separated from each other 
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(b) Concentric 



Fig. 1. Different configurations of the source region and the receiver region 

V 


with ||-Rsr|| > {Rr + Rs) (Fig- |l(a)| l, concentric with 
||-Rsr|| = 0 (Fig. |l(b)| l or overlapping on each other 
with ||i?sT-|| < (Rr + Rs) (Fig. |l(c)| ). Therefore, (fTOl i 
can be used to either partially or fully characterize the 
room according to user requirements. 

• Taking all of the above properties into consideration, the 
proposed parameterization can be interpreted as a modal 
based solution to the wave equation in arbitrary room 
environments. Compared to the classical mode solution 
to the RTF defined for rectangular rooms HI , this 
parameterization has three main advantages. First, this 
method is applicable to any arbitrary room geometry. 
Second, practical room environments (having furniture 
etc.) with irregular boundary conditions are extremely 
difficult to be characterized by the classical solution, 
whereas the new method is valid for any arbitrary acoustic 
environment. Third, unlike the total mode count (^ V(^) 
where V denotes room volume) of the classical model, 
that of the new model {{Nj. + l)^(7Vs + 1)^ ) can be 
reduced by dehning smaller Rg and Rj values to improve 
the computational efficiency. (This property is specially 
advantageous at the Schroeder frequency lIZTll where the 
classical model will require a very large mode count 
resulting in a Gaussian distribution. 

III. Estimation of room transfer function 

COEFFICIENTS 

In this section, we present the procedure of estimating the 
RTF coefficients of (fTOl l for a pre-defined source 

region and a pre-dehned receiver region. As explained earlier, 
a"™ (k) represents the u* order and p* mode reverberant field 


coefficient within rj caused by an unit amplitude n* order and 
m* mode outgoing wavefront originated at C ©■ For each 
outgoing mode from there will be {Nr + 1)^ number of 
unique coefficients describing the reverberant held incident at 
1 ], and to generalize an V* order source region, a total number 
of at least (V;,+l)^(Vj.+l)^ coefficients needs to be extracted. 

It is important to note that, in practice, the following method 
does not require the physical production of unit amplitude out¬ 
going modes from C, and associated room response recordings, 
but only requires the acquisition of room response between a 
set of receivers distributed within p and a set of loudspeakers 
distributed within (, each transmitting a unit amplitude signal. 
Furthermore, given the room characteristics remain stationary 
over time, these measurements can be obtained using a single 
microphone unit and a single loudspeaker unit. However, for 
the purpose of deriving this result, we will discuss a method 
to generate unit amplitude modal wavefronts propagating 
outward from the source region, and a soundheld recording 
technique to extract the corresponding room responses. The¬ 
oretically, these processes are required to be repeated for a 
minimum of {Ng -I- 1)^ number of different cases, but their 
physical implementation will be proven to be needless in 
sec. IIII-BII 


A. Synthesis of a unit amplitude outgoing mode originated 
from the source region 

Let us hrst consider the problem of producing a unit ampli¬ 
tude outgoing mode from ( with respect to Og ©. In order to 
account for all the signihcant outgoing modes from C,, n' and 
m' from © has to be varied from 0 to Ng and from —n to n 
respectively. This results in a total number of distinct 

soundheld production cases and corresponding weight vectors. 

For each case, we propose a mode matching approach where 
the modal coefficients of the desired outgoing held are 
matched with those of the outgoing waveheld produced by an 
array of loudspeakers distributed within C,. Consider L number 
of point sources arbitrarily distributed within C,, where the 
source {(. = 1 • • ■ L) is located at 

with respect to Og. The weighted sum of loudspeaker outputs 
will produce an outgoing soundheld of the form Q where 
Pnmik) is l|26l 

L 

Atik) = Y,w,{k)^kUky't'^)Y:^{eltctl~;}) ( 11 ) 

l=l 


with wi{k) representing the weights at each point source. Our 
objective is to derive loudspeaker weights that will produce 
®, a unit amplitude outgoing wave of order n' and mode m'. 

This can be achieved by equating ® to (fTTT l. which forms 
a set of linear equations of the form 

Tin"'™' = (12) 

where 


T = ik 


too{k,y^^\) 

tN,NAk,y‘^''\) 


tm{k,y^''\) 

tN^NAk^y^"’'' l) 
(13) 
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,.n m 
^1 


(k) 


p (Z:)]^ is an L long vector of loudspeaker 
= [0 • • ■ 0 1 0 • • ■ 0]^ is an {Ns + if long 


is an {Ns -\- if x L translation matrix with t, 

3n{ky‘f^)Y:^{e[t^‘:^}), tn"'’"' = [w 

weights and 0 

vector of desired field coefficients where the (n'^ +n' +m' + 
1)* element is 1 while all others are zero. Since T and /3" ™ 
are both known, the required weights at each loudspeaker can 
be solved using 

(14) 

where denotes the pseudoinverse. To avoid spatial aliasing, 
L>{Ns + if (15) 


has to be satisfied with (fT4l i yielding the minimum energy 
weight solution. 

While (fT4l i provides a numerical solution to the required 
loudspeaker weights, it is also important to decide upon a ro¬ 
bust array geometry. The spherical geometry has been widely 
used for rendering and acquisition of spatial soundfields ll28l - 
m however, its performance in the above task can be 
predicted to be less robust due to the Bessel functions present 
in T. When jn{kyf^) of (fOT l amroaches zero crossings, the 
condition number of T increase^ and since the pseudoinverse 
of an ill-conditioned matrix is often erroneous, those errors 
will be propagated to the weight solution in (fT4l l. Similar 
issues were experienced in spherical microphone arrays used 
for interior field recording Il28l . which were later overcome 
by using rigid arrays ll^ or variable radii arrays like the 
spherical shell array given in EH and the dual spherical 
array given in For the loudspeaker array of interest, a 
rigid geometry requires the incorporation of scattering effects 
and a dual spherical array requires twice the number of 
loudspeakers. Therefore, in this paper, we opt for the simplest 
geometry of choice, an open spherical shell. A spherical shell 
array is equally distributed in the angular space while the 
distance to each loudspeaker randomly varies (with a 
uniform distribution) between a virtual spherical shell of outer 
radius Rs and an inner radius R'^. For in depth reasoning and 
additional solution to the open sphere inverse problem, the 
reader may refer to ll3^ " 

An alternate approach for achieving robust loudspeaker 
arrays was recently introduced in ll34l for 2D soundfields, 
and in llT5l for 3D soundfields where the conventional circu¬ 
lar/spherical arrays of monopole loudspeakers were replaced 
by those of higher order loudspeakers. A 3D higher order 
loudspeaker of order D is capable of producing polar re¬ 
sponses up to the Z?* order. This solution significantly reduces 
the minimum requirement of loudspeaker units by a factor 
of 1/{D -f 1)^ at the expense of increased complexity at 
each loudspeaker unit and therefore, it is more suitable for 
sound reproduction in large spatial areas. Since the practical 
implementation of higher order loudspeakers are still in the 
design stage, the above approach is not used in this paper. 
However, the reader is encouraged to refer to im. Ilia for 
a detailed description of the array processing involved with 
sound rendering using higher order loudspeakers. 


^The 2—norm condition number of a matrix T is defined by K 2 (T) = 
T ||2 ■ llT^lh and for a well conditioned matrix, it will be close to 1. 


B. Extracting the room response at the receiver region 

Once the desired outgoing waves are synthesized at the 
next step involves the extraction of resulting room reflections 
incident at rj. It is important to note that all recordings obtained 
at 7] carry both direct and reflected wavefronts originated at (, 
and since we are only parameterizing the reverberant field, the 
direct field component at each sensor output must be removed 
prior to further processing. 

Furthermore, as the reverberant field of interest (01 is a 
source free incoming field, its extraction can be treated as an 
interior field recording problem. The conventional approach 
to record an Nf order incoming spatial soundfield requires a 
minimum of {N^ -f 1)^ omnidirectional microphones equally 
distributed on a spherical surface enclosing the region of in¬ 
terest 1^ . However, this approach has been proven to be less 
robust due to the above mentioned "Bessel zero problem" and 
as explained earlier, alternate geometries were later proposed 
in Eol-illll to overcome this issue. 

A further improved solution to the interior field recording 
problem was recently introduced in Il26l where the omnidi¬ 
rectional microphones were replaced by higher order (HO) 
microphones. A higher order microphone of order A is capable 
of recording an A* order spatial soundfield with respect to 
the microphone’s local origin. Thus, the use of A* order 
microphones in recording an Nf order soundfield substantially 
reduces the minimum requirement of measurements by a factor 
of l/(A-f 1)^ at the expense of added complexity at each mi¬ 
crophone unit. Compared to the conventional omnidirectional 
microphone array, this approach also showed a significant 
improvement in the condition number of the translation matrix, 
which in turn increased the array’s robustness. Furthermore, 
unlike the higher order loudspeakers, the practical implemen¬ 
tation of HO microphones are relatively simple and there 
exist a number of different designs that were successfully 
implemented in practice [33 . The "Eigenmike" is one such 
commercially available fourth order microphone with an active 
frequency range of 0 — 6.5 kHz. 

Due to the aforementioned efficiency and availability of HO 
microphones, we propose an array of Q identical A* order 
microphones to be employed in the coefficient extraction 
process. For each unit amplitude outgoing wavefield produced 
at C„ there will be an Nf order soundfield incident at ry, and 
according to ll2^ . the extraction of such a soundfield requires 
a minimum of (5 > (ZVr + 1)^/(A-|-1)^ HO microphone units 
distributed in any arbitrary geometry enclosing the region 
of interest. The translation between the HO microphone 
outputs and the desired reverberant soundfield is based on a 
coefficient translation theorem developed in ll26l . which will 
be discussed in detail in sec. IIII-B3I 

1) Higher order microphone: Let us now briefly discuss 
the functionality of a 3D HO microphone. Consider the 
q^{q = I ■■ - Q) HO microphone located at Oq with Rq = 
{Rq,9q,(j)q) representing the vector connecting O to Oq. 
For numerical simplicity, we assume it is designed following 
the open array geometry given in ll29l , where an A* order 
microphone is composed of an array of Q' > (A-fl)^ number 
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of omnidirectional microphones equally distributed along a 
virtual spherical surface of radius r = Acjirefinax where 
/max denotes the maximum frequency of interest in broadband 
operation. The q'* (q' = 1 ■ ■ ■ Q') microphone of this array 
will be located at Tgg/ = (rqqi , Oqqi , (j)qqi ) with respect to Og 
recording 

A a 

(^qq' 1 ~ {^)jaikrqqi)Yab{dqqi ,(j)qqi) 

a—0 b— — a 

( 16 ) 

where rqq' = r for all q' and 7 ^^^ (k) represents the soundfield 
coefficients with respect to Og. Over the entire array, there 
will be a total of Q' recordings of the above form, and based 
on the orthogonal property of spherical harmonics 1291 . they 
can be collectively combined to extract 7 ^®^ {k) using 


2) Removal of the direct soundfield: As 7 a 6 ’"’"^^(^) ^6 
considered as the HO microphone outputs, it is essential to 
remove their direct path components prior to further array 
processing. The coefficients 'Tab'^’'^\k) can be decomposed 
into direct and reverberant field components as 

7 <r w = w+(« m 

where the direct field component is known to be [|25l 

L 

(^) = E {k)lkha{kRqe)Y:,i0qe, fqt) ( 21 ) 

1=1 

with {Rqi^,9qi,(l)qi) denoting the spherical coordinates of 
Rqt = Vi — Rq- Therefore, once are obtained, 

(I 20 I 1 and (l 2 n i can be used to eliminate their direct field 
components. 




1 

jaif-Tqqi ) 


Q’ 

E P^\rm’^k)Y:,{6qq.,fqq,). 


q’ = l 


( 17 ) 


When the above microphone is used to record the room 
response caused by the n* order and m* mode unit outgoing 
wavefield originated from Q, the incident pressure at the q'* 
omnidirectional microphone will be 


Pq^\n,rn,rg 


L 

k) = Y,wr{k)H{k,x^^\y,) ( 18 ) 

i=i 


where H{k^x^^\yi) denotes the RTF between the omnidi¬ 
rectional receiver at x^^? = Rg + rgg> and the weighted point 
source at + Rsr with respect to O. Substituting for 

(fTTb from (fTSl l we derive the corresponding outputs at the q* 
HO microphone as 


3) Array of higher order microphones: The final step in 
array processing involves the translation of 'yl^b^l^\k) to 
the desired RTF coefficients a"™(A:). As mentioned earlier, 
this can be done following the coefficient translation theorem 
introduced in l35l as 


Nr V 

= E E <::ik)St{Rq) ( 22 ) 

= 0 fL— — V 

where 


X v^(2r; + l)(2a + 1)(2( + l)/47rVFi W 2 , with 


»'-(o s 

representing Wigner 3 — j symbols. For all Q number of 
microphones, (l22T i can be interpreted in matrix form as. 


Q' 


1=1 

1 


■ /7 \ E! Riky^^'\y£)Pabi^qq'':4’qq') 

la\Krqq') 


( 19 ) 




where 7 ^^’^^ {k) denotes the incident soundfield coefficients at 
Og caused by a unit amplitude loudspeaker located at y^. 

Consequently, if the room response between the 

loudspeaker and the q* HO microphone can be recorded 
for all L loudspeakers and all Q HO microphones, then, 
7 I& " (k) can be easily derived using the linearity property 

given in (fT^ . This profound result significantly simplifies the 
coefficient extraction process by completely eliminating the 
requirement for (fT^ ’s physical implementation. Furthermore, 
all {Ns + 1)^ distinct cases of (fT^ can now be synthesized 
using the same set of 7 ^^’^^ {k) measurements and appropriate 
numerical processing. 


7 = T'ol (23) 

, r (l,n,m) (l,n,m) (Q,n,m) {Q)^T • 

where 7 = [ 7^0 7 --Too 7-7aa] is a 

Q{A+\Y long vector, a = [aoo,. ciNrNr]'^ is a {Nr + 1 )^ 

long vector, and 


T' 


'SSlj{Ri) 

JSa(Rq) 


sl^o(Ri)' 

s^:iiRQ). 


(24) 


is a Q{A + 1)^ X {Nr + 1)^ matrix. As T' is known, and the 
local recordings in 7 can be derived from ( fT^ . (l23t can be 
solved to find the desired coefficients a, using 


a = r'7- 


(25) 


To avoid spatial aliasing, 

Q>{Nr + l)y{A+l)^ (26) 

has to be satisfied Il26l with (|25) yielding a least squares 
solution. 
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C. Summary of the coefficient extraction process 

The coefficient extraction process involved with the RTF 
parameterization proposed over an order source region 
and an order receiver region is summarized as follows. 
Theoretically, this process requires {Ng + 1)^ number of 
distinct outgoing waves created at C, and the same number 
of reverberant field extractions simultaneously carried out at 
f]. Each case requires a minimum of {Ng + 1)^ point sources 
or (Ng + + 1 )^ number of £>* order loudspeakers at ( 

to synthesize the outgoing field, and a minimum of {Nr + 1 )^ 
omnidirectional microphones or {Nr + 1 )^/(A + 1 )^ number 
of A* order microphones to extract the reverberant field at p. 
Depending on the size and frequency content of the source and 
receiver regions, the user can employ any combination of point 
sources, higher order sources, omnidirectional microphones 
and higher order microphones. As explained in sections IIII-AI 
andunm this work illustrates one of the above combinations, 
an open spherical shell array of point sources and an open 
spherical array of HO microphones. 

However, in practice, it is only required to extract the room 
response between each loudspeaker and each microphone from 
the above arrays, and by incorporating these measurements 
with the numerical computations given in (fT4l i. ( fT^ . (l20l i and 
(l25l l. the desired RTF coefficients can be successfully derived. 
Moreover, given the room characteristics remain stationary 
over time, the above measurements can be obtained using a 
single microphone unit and a single loudspeaker unit moved 
along the respective arrays. 

Practical limitations of the proposed measurement method 
arise with large Rg and Rr values at high frequencies due 
to the increased number of modal components required to 
describe the spatial soundfield of interest. Common forms 
of these limitations include the large requirement of micro¬ 
phone and loudspeaker numbers in non-stationary conditions, 
increased demand for high computational power, and the 
design and implementation constraints involved with large 
spherical/shell geometries. Furthermore, the proposed use of 
HO microphones to record p may not be practically feasible at 
present due to the lack of affordable HO microphones that are 
commercially available. The above constraints may however 
be overcome by defining smaller source and receiver regions to 
suit the application of interest, and the HO microphone array 
can be easily replaced with omnidirectional ones to reduce 
costs. In addition to the above constraints, in a real room, 
temperature (and to a lesser extent humidity) fluctuations can 
cause the room impulse responses to change, especially in the 
late reverberant tails and the higher frequency components. 
However, the proposed approach is likely to be accurate at 
low frequencies and to the modeling the RTF components of 
low order reflections. 

D. Approximate parameterization error 

The total error involved with the proposed RTF parame¬ 
terization has several components that will be encountered 
at different stages of the parameterization process. The first 
component will appear at the loudspeaker array processing 
phase (fTST i. in the forms of truncation error {Ng = \keRs/2]) 


in (O, and least squares error in (fT4l i related to the geometry 
and numbers of loudspeakers. The next component will occur 
at each HO microphone, again in the forms of truncation error 
{A = \ker/2\) in (fThl l. and Bessel-zero error in (fTTI i. The final 
component will develop in the coefficient translation phase, 
once more in the forms of truncation error {Nr = [kei?r/2]) 
in (|6]l, and least squares error in (l25T l. A detailed decomposition 
of each of the above components is of least interest in the 
current context, thus, we only study the total error. For com¬ 
putational simplicity, we define an approximate error averaged 
over a finite number of design points from C and rj as 

E \\H{xg,yg,k) - H{xg,yg,k)\\ 

E = - (27) 

E \\H{xg,yg,k)\\ 

s=i 

where G denotes the number of source and receiver point 
combinations being considered and H denotes the existing 
RTF. 

IV. Simulations 

In the following simulation examples, we illustrate the 
accuracy of the proposed RTF parameterization in broadband 
applications. A 6 x 5 x 2.5 m rectangular room was considered 
as the reverberant environment with its center defined as the 
origin O. The RTF was parameterized over a spherical receiver 
region rj of radius Rr = 0.4 m centered about O and a 
spherical source region ( of radius Rg = 0.4 m centered about 
Os- The location of Os was varied accordingly to simulate a 
non-overlapping and an overlapping configuration of ( and p. 
The design frequency range was assumed up to /max = 1 kHz 
producing a tenth order receiver region (AV(max) = 10) and 
a tenth order source region (A^s(max) = 10). For frequencies 
below /max, the truncations limits would drop, and therefore, 
when operating with varying frequencies, Ng and Nr were 
varied accordingly. 

From (fTSt . the synthesis of a unit amplitude outgoing wave 
from ( required a minimum of L = 121 point sources dis¬ 
tributed in a preferred geometry. As explained in Sec. (IIII-All . 
we opted for a spherical shell geometry, which required the 
sources to be equally distributed in the angular space and 
randomly varied in the magnitude space. While llJTll provided 
an approximate solution for the desired angular distribution, 
the distance to each source || was randomly varied (with 
uniform distribution) between a spherical shell of outer radius 
of Rg = 0.4 m and an inner radius of i?' = 0.3 m. 

The reason behind selecting a spherical shell geometry over 
the conventional single sphere geometry was to improve the 
array robustness, and we validated this decision by comparing 
the condition number of the translation matrix K 2 {T) related 
to both geometries. As shown in Fig. |2l the condition number 
K 2 {T) was plotted against frequency for a spherical shell 
geometry with the above parameters and a single sphere 
geometry of radius Rg = 0.4 m. The expected ill-conditioning 
of the single sphere geometry is very much evident with K 2 {T) 
reaching a couple of large peaks at / = 420 Hz and / = 850 
Hz. In contrast, K 2 {T) of the spherical shell geometry gives 
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Fig. 2. Condition number variation of T in o. 


Fig. 4. (a) Actual and (b) reconstructed RTF between a fixed receiver location 
at y = (0, 0, 0) m and all points in the source region for a non overlapping 
distribution of rj and with Rsr = (1, 1, 0.5) m. 



Fig. 3. Actual and reconstructed RTF between a fixed source location y 
and all points in the receiver region for a non overlapping distribution of rj 
and with Rsr = (1, 1, 0.5) m. (a) Actual and (b) reconstructed RTF 
for y = (1.05,1.05,0.5707) m. (c) Actual and (d) reconstructed RTF for 
y = (1.15,1.15,0.6207) m. 


much improved results by avoiding all of the above peaks. 
Therefore, we can conclude that a variation of in T 

successfully overcomes the Bessel zero problem in solving 
(Ell. The sawtooth characteristic of the condition number 
variation may have caused by the frequency-dependent mode 
order (Ns). 

Once the unit amplitude outgoing wavefields were synthe¬ 
sized at C, it was required to extract the resulting reverberant 
fields incident at p. For this purpose, we opted for a spherical 
array geometry of radius Rr = 0.4 m enclosing rj, and accord¬ 
ing to (l26l l. a minimum of 121 omnidirectional microphones 
were required to avoid spatial aliasing. To improve the array 
robustness and to reduce the number of measurements, we 
replaced the omnidirectional microphones with third order 
ones (A = 3), which substantially reduced the minimum 
requirement of Q to (Nj. -\- Vf /(A -f 1)^ ss 9. It is important 
to note that Q was restricted to square numbers in order to 
facilitate the spatial distribution given in llJTl . which provides 
an approximate solution to the equal division of a spherical 
surface. 



Fig. 5. Approximate parameterization error in) for 4 different 

cases. In each case, the error was averaged over G = 7 
source and receiver point combinations, where each point was at 
[(0, 0, 0), {-R, 0, 0), {R, 0, 0), (0, -R, 0), (0, R, 0), (0, 0, -R), (0, 0, R),] 
with respect to Os and O respectively. 


However, in the simulations given below, we assumed the 
room acoustics to be stationary which in turn required only 
one point source and one third order microphone to measure 
the room response between 121 x 9 combinations of yi and 
Rq. The number of measurements can be further reduced by 
an approximate factor of 1/{D + 1)^ if the point source was 
replaced by a higher order loudspeaker of order D. 

The actual room response measurements were simulated 
using the image-source method m which defines the RTF 
between x and y in terms of 

I 

H{x,y,k) = ho(k\\x - y\\) +'^Qho{k\\x - yiW) (28) 

i=l 

where yi and Q are the position and accumulated wall 
reflection coefficient of the image source. In this paper, we 
considered image sources up to the second order with wall 
reflection coefficients [0.9 0.9 0.9 0.9 0.7 0.7]. 

We first looked at a non-overlapping distribution of p and C 
by defining the vector from O to Os as Rgr = (1 1 0.5) m. 
Once (k) of ([T9l l were measured using the simulated en¬ 
vironment given in (l28l l. we calculated the loudspeaker weight 
vector Wnm for (Ng + 1)^ distinct cases accounting for all 
combinations of n and m. Afterward, they were incorporated 
in ([T9 T i to calculate which was later modified using 

(l20l i and (EB to derive the HO microphone recordings of 
the reverberant field, Finally, were 

translated to the desired RTF coefficients (a"™(k)), using the 
coefficient translation relationship given in (l22l i. 
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Even though the extracted RTF coefficients are capable of 
mapping each point in the source region to each point in the 
receiver region over 0 — 1 kHz, it is not possible to plot 
them all at once. Therefore, we first demonstrate the array 
robustness to receiver variations by plotting the RTF between 
a particular point in the source region and all points in the 
receiver region for a single frequency. Next, we generated a 
similar plot for a secondary source location to observe the 
array robustness to a slight variation in source positioning. In 
order to further validate this property, we finally plotted the 
RTF between a particular point in the receiver region and all 
points in the source region, accounting for all possible source 
variations. Please note that all spatial plots were constrained 
to a 2D horizontal cross section with zero elevation for ease 
of presentation. 

Figures [3(a)| and [3(b)| show the actual and reconstructed RTF 
between a source at y = (1.05,1.05,0.5707) m and all points 
in the receiver region for a frequency of / = 900 Hz (the 
circle represents the receiver region). Similarly, Figs. |3(c)| and 
|3(d)| show the corresponding results for a secondary source 
at y = (1.15,1.15,0.6207) m. The reconstructed results in 
both cases appear almost error-less verifying the accuracy of 
the proposed parameterization and its robustness to receiver 
variations as well as a slight variation in source positioning. 
Demonstrating source variations in a larger scale. Figs. |4(a)| 
and |4(b)| show the actual and reconstructed RTF between a 
fixed receiver at y = ( 0 , 0 , 0 ) m and all points in the source 
region (the circle represents the source region). As expected, 
the reconstructed field is almost the same as the actual one, 
verifying the proposed model’s robustness to source variations. 

Analyzing the broadband performance of the proposed 
parameterization, we plotted the reproduction error 
of the recorded RTF against frequency over a range of 
200 — 1700 Hz. In order to average the error, we considered 
a sample set of 7 points from the source region and 7 points 
from the receiver region resulting in G = 7 one-to-one 
combination^ The source and receiver points were located at 
[(0,0,0), (-77,0,0), (i?, 0,0), (0, -i?, 0), (0,77,0), (0,0, -77), 
(0,0,77),] with respect to Os and O respectively. Figure |5] 
shows the results for different values of 77. The error remains 
very low up to the maximum frequency of interest 1 kHz, 
beyond which it slowly builds up. The increasing error 
present from 1 kHz onwards is due to spatial aliasing in 
both reproduction and recording phases. The low amplitude 
errors present within the active frequency range ( 0.2 — 1 
kHz) are possibly stemmed from the HO microphone 
simulations. When a fixed A* order microphone with a 
maximum recordable frequency /max is employed to record 
low frequencies (/ < /max), lab mode will be successfully 
recorded only if / < where denotes the activation 
frequency of the o* order 5* mode component of the 
soundfield of interest. In other words, at low frequencies, the 
soundfield modes that are actually present or activated are 
only up to the order A' = irfer/C, and all modes produced 

* One-to-one combinations meaning, the first source location paired with 
the first receiver location, the second source location paired with the second 
receiver location etc. 




x(m) x(m) 


(C) (d) 

Fig. 6. Actual and reconstructed RTF between a fixed source location y 
and all points in the receiver region for an overlapping distribution of rj and 
^ with Rsr = (0.3 0.3 0.3) m. (a) Actual and (b) reconstructed RTF for 
y = (0.35, 0.35, 0.3707) m. (c) Actual and (d) reconstmcted RTF for y = 
(0.45,0.45,0.4207) m. 


beyond A' will be erroneous due to the l/ja{-) term in (fTTI) . 
In order to minimize these errors, the higher order modes 
can be discarded as they are simply absent in the actual 
soundfield. The same technique can be applied to the larger 
microphone array when calculating the soundfield at 77 . When 
the inactive modes are discarded as explained above, the 
matrix dimensions of (l24l) will vary with varying frequency. 
An extensive study on this solution and the resulting 
improvement in array robustness to noise is presented in lf38ll 
for the 2D case. 

In order to verify the geometrical flexibility of the proposed 
parameterization, we repeated the same process for a different 
configuration of rj and /. This was done by re-defining the 
vector from O to Os as Rsr — (0.3 0.3 0.3) m which 
resulted in 77 and C to overlap on each other. However, all other 
design parameters were remained the same, which added no 
changes to the loudspeaker and microphone array parameters. 
Figures | 6 (a)| and | 6 (b)| show the actual and reconstructed RTF 
between a source at y = (0.35, 0.35,0.3707) m and all points 
in the receiver region for a frequency of / = 900 Hz (the 
circle represents the receiver region). Similarly, Figs. | 6 (c)| and 
| 6 (d)| shows the corresponding results for a secondary source 
at y = (0.45, 0.45, 0.4207) m. The reconstructed results in 
both cases appear almost error-less verifying the geometrical 
flexibility of the proposed parameterization. However, when 
77 and / overlap on each other, extra caution should be 
taken to avoid potential conflicts between the loudspeaker and 
microphone locations. Furthermore, when a loudspeaker is too 
near to a microphone, there will be potential errors stemmed 
from nearfield truncation lf39l . 
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V. Conclusion 

We have introduced a novel method to parameterize the 
three dimensional RTF between two arbitrary points from a 
sizeable spatial region where the source(s) lie and a sizeable 
spatial region where the receiver(s) lie. The modal based 
parameterization only requires a finite number of RTF coeffi¬ 
cients to describe an infinite number of RTFs between the two 
regions and therefore, it can also be used to characterize an 
entire room at once. However, when an arbitrary shaped room 
is being measured for RTF parameterization, the corresponding 
microphone and loudspeaker array geometries may be altered 
to a similar geometry along (or close to) the room walls. 
We also presented a practical method of extracting the RTF 
coefficients which only requires a single loudspeaker and 
a single microphone, provided the room acoustics remain 
stationary. This result substantially simplifies the problem of 
room equalization by simplifying the RTF measurements. It 
also has the added advantage of providing robustness to both 
source variations as well as receiver variations. 
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